Module 0: Lecture Notes 2

Introduction to Google Cloud and Colab

Lecture
M00:
Notes
Lecture covering Google Cloud and Colab Setup.
Published

November 21, 2024

Modified

February 17, 2026

1 What is Google Colab?

Google Colab is a product from Google Research that allows anyone to write and execute arbitrary Python code through the browser. It’s especially well-suited for machine learning, data analysis, and educational purposes. With Colab, you can leverage the power of GPUs and TPUs for free, making it a popular choice for resource-intensive tasks.

2 Why Use Google Colab?

  • Free Access to Powerful Hardware: Google Colab provides free access to GPUs and TPUs, which can significantly speed up computations compared to a standard CPU.‍
  • No Installation Required: Being a cloud-based service, there’s no need to install any software on your computer. All you need is a web browser and a Google account.
  • Collaborative Features: Just like Google Docs, Colab notebooks can be shared and edited by multiple users in real-time, making collaboration seamless.
  • Integration with Google Drive: You can save your notebooks directly to your Google Drive, ensuring easy access and sharing.

3 Getting Started

3.1 Sign Up for a Google Cloud and Colab Account

  • Use your Boston University account to sign up for Google Cloud Platform (GCP). This will give you access to various Google Cloud services, including Colab and a $300 credit for 90 days.
  • Visit Colab Signup
    • Select Colab pro and follow the prompts along
    • You will need your BU ID to take a picture for verification purposes.
    • You will need to add a payment method, but you will not be charged until you exceed the free tier limits.
    • You will need a government-issued ID for verification purposes.

Colab Pro Sign Up

3.2 Creating a New Notebook

  1. Once on the Colab homepage, click on the New notebook button. You might get a pop-up like the image below where you can click the button New notebook in the lower left corner. If not you can always click File → New notebook.

  1. This will open a new tab with a fresh notebook where you can start writing and executing Python code.

3.3 Changing Run-Time

When running deep-learning scripts, you will need to change from CPU to GPU. To do this, click Runtime → Change runtime type, then select T4 GPU. Then click Save.

Important

You are encouraged to sign up to Google Colab with an existing Google account. If you create a new Google account, your GPU usage may be cut off.

3.4 Run Python on Your Notebook

  1. To execute Python code, click the ‘run’ button to the left of the cell. This is the circle with the triangle inside

  1. To change the title of your Google Colab notebook, just click on the title and rename as AD698_your-bu-userid_SP2025_HW01.ipynb.

  1. To save a copy of your notebook on your local computer. You can navigate to File → Download and pick the appropriate extension.

Google Colab is a powerful tool that allows access to advanced computational resources. Whether you’re just starting out in data science or are a seasoned professional, Colab offers a flexible, collaborative, and resource-rich environment. So, dive in, experiment, and take advantage of everything Colab has to offer!

4 Accessing data on Google Drive

Code below mounts your Google Drive to /content/drive/My Drive directory.

In case google.colab is not found it assumes that the notebook runs somewhere else, and it doesn’t mount the drive.

try:
    from google.colab import drive
    drive.mount("/content/drive/", force_remount=True)
    google_drive_prefix = "/content/drive/My Drive"
    data_prefix = "{}/mnist/".format(google_drive_prefix)
except ModuleNotFoundError: 
    data_prefix = "data/"

After that your data are available locally: f = gzip.open("{}/train-images-idx3-ubyte.gz".format(data_prefix), 'r')

4.1 Using GPUs

This will reload your runtime!

Navigate to: Runtime -> Change runtime type -> Hardware accelerator -> pick GPU (or TPU)

5 Integration with Github

  1. It is more convenient to work on your code locally (e.g. Using JupyterLab or Pycharm)
  2. Use Colab only to execute the code on GPU and do some small changes (e.g. Hyper parameters tuning).
  3. Unfortunately there is no git pull functionality in Colab, so once you push new changes into the Repository you have to reopen your notebook (File ⇾ Open notebook ⇾ GITHUB ⇾ Open notebook in new tab (Square with arrow next to the notebook name)).
  4. The downside is that the runtime has to be reloaded.
Warning This might create conflicts!
  • However, it’s possible to push changes made in colab to the Repo (File -> Save a copy in Github): colab push

  • This will make a commit to the Repository.

5.1 Setting up a repository

The first thing you need is to have a repository for the project. On https://github.com/new you can go ahead and set up a new repository. In this example it will be a public repository, but the same works for private ones.

For this example, I am adding a README.md file, a .gitignore based on a Python template under MIT License. Once you created the repository, we now need to make sure we can access it by Google Colab environment.

6 Setting up a fine-grained access token on GitHub

Now we need a token. Private access tokens are special passwords that you can configure to have different permissions to your account on GitHub. On GitHub website, click on settings on the menu bar that shows up after clicking your profile photo.

Now scroll down the left side bar and click on Developer settings and then click on Fine-grained tokens under Personal access tokens.

This will open a new tab and you will be able to see a button for creating a new token. For this example, I am setting up this token for 30 days and selecting this repository only. (For repositories under an organization, change the resource owner to the organization where the repository is stored.)

For Repository permissions, change the access to the Contents to Read and Write.

Now scroll down and click in Generate token. This will show you the list of tokens and you will be able to copy this one.

Now we can go to Google Drive to setup our workspace and git commands.

7 Setting up Google Drive and Colab

On Google Drive, setup a folder for your repositories. In my case, I created a folder called Shared. Now create a new Google Colab file in this folder. This colab file will run the git commands. In my case, I am calling it Git.ipynb.

The fist cell of this notebook will mount your Google Drive

from google.colab import drive
drive.mount('/content/drive')

The next cell will clone the repository and configure the global variables for git.

import os
import subprocess

# Repository name
repository = "ColabTesting"

# Base path
base = "/content/drive/MyDrive/Shared"

# Specify the folder path
folder_path = f"{base}/{repository}"  # Change this to the desired folder path

# Username
username = "<your-username>"

# User email
email = "<your-email>"

# Full name
name = "<your-full-name>"

# Token
token = "<token>"

# Owner
owner = username

# Move to the repository folder
%cd {base}

# Check if folder exists
if not os.path.exists(folder_path):
    clone_url = f'https://{username}:{token}@github.com/{owner}/{repository}.git'
    # Clone repository from GitHub
    !git clone {clone_url}
else:
    print(f"Folder '{folder_path}' already exists.")

# Move to folder
%cd {folder_path}

!git pull

# Update .gitconfig using subprocess
subprocess.run(['git''config''--global''user.email', email], check=True)
subprocess.run(['git''config''--global''user.name', name], check=True)

The following cells are for git commands.

Pulling changes from remote:

!git pull

Checking the status:

!git status

Staging all local changes:

!git add --all

Comment on the changes:

!git commit -m "Update"

Push it back to remote:

!git push

8 Testing the workflow

Now you have a new folder on the google drive. Let try to create a notebook there and test the git workflow.

Now we can come back to Git.ipynb and run the whole notebook. It will bypass the clone command, pull changes from remote, stage this change, comment and push it back to remote.

In my case, the first !git status returned:

On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
    (use "git add <file>..." to include in what will be committed)
    Test_Notebook.ipynb

nothing added to commit but untracked files present (use "git add" to track)

!git commit -m "Update" retuned:

[main 52968bb] Update
1 file changed, 1 insertion(+)
create mode 100644 Test_Notebook.ipynb

The second !git status returned:

On branch main
Your branch is ahead of 'origin/main' by 1 commit.
    (use "git push" to publish your local commits)

nothing to commit, working tree clean

And the !git push returned:

Enumerating objects: 4, done. 
Counting objects: 100% (4/4), done.
Delta compression using up to 2 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 501 bytes | 125.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
To [https://github.com/iuryt/ColabTesting.git](https://github.com/iuryt/ColabTesting.git) 
    dfaaaea..52968bb main -> main

Now, everytime you make a change in the repository, you come back to this file and run the git commands.